belief point
Efficient Offline Communication Policies for Factored Multiagent POMDPs
Factored Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) form a powerful framework for multiagent planning under uncertainty, but optimal solutions require a rigid history-based policy representation. In this paper we allow inter-agent communication which turns the problem in a centralized Multiagent POMDP (MPOMDP). We map belief distributions over state factors to an agent's local actions by exploiting structure in the joint MPOMDP policy. The key point is that when sparse dependencies between the agents' decisions exist, often the belief over its local state factors is sufficient for an agent to unequivocally identify the optimal action, and communication can be avoided. We formalize these notions by casting the problem into convex optimization form, and present experimental results illustrating the savings in communication that we can obtain.
- Europe > Portugal > Lisbon > Lisbon (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
Entropy-regularized Point-based Value Iteration
Delecki, Harrison, Vazquez-Chanlatte, Marcell, Yel, Esen, Wray, Kyle, Arnon, Tomer, Witwicki, Stefan, Kochenderfer, Mykel J.
Model-based planners for partially observable problems must accommodate both model uncertainty during planning and goal uncertainty during objective inference. However, model-based planners may be brittle under these types of uncertainty because they rely on an exact model and tend to commit to a single optimal behavior. Inspired by results in the model-free setting, we propose an entropy-regularized model-based planner for partially observable problems. Entropy regularization promotes policy robustness for planning and objective inference by encouraging policies to be no more committed to a single action than necessary. We evaluate the robustness and objective inference performance of entropy-regularized policies in three problem domains. Our results show that entropy-regularized policies outperform non-entropy-regularized baselines in terms of higher expected returns under modeling errors and higher accuracy during objective inference.
- North America > United States > California > Santa Clara County > Stanford (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > United States > California > Santa Clara County > Santa Clara (0.05)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.79)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.71)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.57)
A model of communication-enabled traffic interactions
Siebinga, O., Zgonnikov, A., Abbink, D. A.
A major challenge for autonomous vehicles is handling interactive scenarios, such as highway merging, with human-driven vehicles. A better understanding of human interactive behaviour could help address this challenge. Such understanding could be obtained through modelling human behaviour. However, existing modelling approaches predominantly neglect communication between drivers and assume that some drivers in the interaction only respond to others, but do not actively influence them. Here we argue that addressing these two limitations is crucial for accurate modelling of interactions. We propose a new computational framework addressing these limitations. Similar to game-theoretic approaches, we model the interaction in an integral way rather than modelling an isolated driver who only responds to their environment. Contrary to game theory, our framework explicitly incorporates communication and bounded rationality. We demonstrate the model in a simplified merging scenario, illustrating that it generates plausible interactive behaviour (e.g., aggressive and conservative merging). Furthermore, human-like gap-keeping behaviour emerged in a car-following scenario directly from risk perception without the explicit implementation of time or distance gaps in the model's decision-making. These results suggest that our framework is a promising approach to interaction modelling that can support the development of interaction-aware autonomous vehicles.
- Europe > Netherlands > South Holland > Delft (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Transportation > Ground > Road (0.93)
- Automobiles & Trucks (0.92)
- Leisure & Entertainment > Games (0.89)
Accelerated Vector Pruning for Optimal POMDP Solvers
Walraven, Erwin (Delft University of Technology) | Spaan, Matthijs T. J. (Delft University of Technology)
Partially Observable Markov Decision Processes (POMDPs) are powerful models for planning under uncertainty in partially observable domains. However, computing optimal solutions for POMDPs is challenging because of the high computational requirements of POMDP solution algorithms. Several algorithms use a subroutine to prune dominated vectors in value functions, which requires a large number of linear programs (LPs) to be solved and it represents a large part of the total running time. In this paper we show how the LPs in POMDP pruning subroutines can be decomposed using a Benders decomposition. The resulting algorithm incrementally adds LP constraints and uses only a small fraction of the constraints. Our algorithm significantly improves the performance of existing pruning methods and the commonly used incremental pruning algorithm. Our new variant of incremental pruning is the fastest optimal pruning-based POMDP algorithm.
- Europe > Netherlands > South Holland > Delft (0.04)
- North America > Canada > British Columbia > East Kootenay Region > Fernie (0.04)
A POMDP Formulation of Proactive Learning
Wray, Kyle Hollins (University of Massachusetts Amherst) | Zilberstein, Shlomo (University of Massachusetts Amherst)
We cast the Proactive Learning (PAL) problem—Active Learning (AL) with multiple reluctant, fallible, cost-varying oracles—as a Partially Observable Markov Decision Process (POMDP). The agent selects an oracle at each time step to label a data point, while it maintains a belief over the true underlying correctness of its current dataset’s labels. The goal is to minimize labeling costs while considering the value of obtaining correct labels, thus maximizing final resultant classifier accuracy. We prove three properties that show our particular formulation leads to a structured and bounded-size set of belief points, enabling strong performance of point-based methods to solve the POMDP. Our method is compared with the original three algorithms proposed by Donmez and Carbonell and a simple baseline. We demonstrate that our approach matches or improves upon the original approach within five different oracle scenarios, each on two datasets. Finally, our algorithm provides a general, well-defined mathematical foundation to build upon.
Solving Risk-Sensitive POMDPs With and Without Cost Observations
Hou, Ping (New Mexico State University) | Yeoh, William (New Mexico State University) | Varakantham, Pradeep (Singapore Management University)
Partially Observable Markov Decision Processes (POMDPs) are often used to model planning problems under uncertainty. The goal in Risk-Sensitive POMDPs (RS-POMDPs) is to find a policy that maximizes the probability that the cumulative cost is within some user-defined cost threshold. In this paper, unlike existing POMDP literature, we distinguish between the two cases of whether costs can or cannot be observed and show the empirical impact of cost observations. We also introduce a new search-based algorithm to solve RS-POMDPs and show that it is faster and more scalable than existing approaches in two synthetic domains and a taxi domain generated with real-world data.
- North America > United States > New Mexico > Doña Ana County > Las Cruces (0.04)
- Asia > Singapore (0.04)
- Transportation > Ground > Road (1.00)
- Transportation > Passenger (0.89)
Multi-Objective POMDPs with Lexicographic Reward Preferences
Wray, Kyle Hollins (University of Massachusetts, Amherst) | Zilberstein, Shlomo (University of Massachusetts, Amherst)
We propose a model, Lexicographic Partially Observable Markov Decision Process (LPOMDP), which extends POMDPs with lexicographic preferences over multiple value functions. It allows for slack--slightly less-than-optimal values--for higher-priority preferences to facilitate improvement in lower-priority value functions. Many real life situations are naturally captured by LPOMDPs with slack. We consider a semi-autonomous driving scenario in which time spent on the road is minimized, while maximizing time spent driving autonomously. We propose two solutions to LPOMDPs--Lexicographic Value Iteration (LVI) and Lexicographic Point-Based Value Iteration (LPBVI), establishing convergence results and correctness within strong slack bounds. We test the algorithms using real-world road data provided by Open Street Map (OSM) within 10 major cities. Finally, we present GPU-based optimizations for point-based solvers, demonstrating that their application enables us to quickly solve vastly larger LPOMDPs and other variations of POMDPs.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Iceland > Capital Region > Reykjavik (0.04)
- Health & Medicine (0.94)
- Transportation > Ground > Road (0.35)
Information Gathering and Reward Exploitation of Subgoals for POMDPs
Ma, Hang (McGill University) | Pineau, Joelle (McGill University)
Planning in large partially observable Markov decision processes (POMDPs) is challenging especially when a long planning horizon is required. A few recent algorithms successfully tackle this case but at the expense of a weaker information-gathering capacity. In this paper, we propose Information Gathering and Reward Exploitation of Subgoals (IGRES), a randomized POMDP planning algorithm that leverages information in the state space to automatically generate "macro-actions" to tackle tasks with long planning horizons, while locally exploring the belief space to allow effective information gathering. Experimental results show that IGRES is an effective multi-purpose POMDP solver, providing state-of-the-art performance for both long horizon planning tasks and information-gathering tasks on benchmark domains. Additional experiments with an ecological adaptive management problem indicate that IGRES is a promising tool for POMDP planning in real-world settings.
- North America > Canada > Quebec > Montreal (0.14)
- North America > Canada > Ontario > Toronto (0.14)
Covering Number as a Complexity Measure for POMDP Planning and Learning
Zhang, Zongzhang (University of Science and Technology of China) | Littman, Michael (Rutgers University) | Chen, Xiaoping (University of Science and Technology of China)
Finding a meaningful way of characterizing the difficulty of partially observable Markov decision processes (POMDPs) is a core theoretical problem in POMDP research. State-space size is often used as a proxy for POMDP difficulty, but it is a weak metric at best. Existing work has shown that the covering number for the reachable belief space, which is a set of belief points that are reachable from the initial belief point, has interesting links with the complexity of POMDP planning, theoretically. In this paper, we present empirical evidence that the covering number for the reachable belief space (or just ``covering number", for brevity) is a far better complexity measure than the state-space size for both planning and learning POMDPs on several small-scale benchmark problems. We connect the covering number to the complexity of learning POMDPs by proposing a provably convergent learning algorithm for POMDPs without reset given knowledge of the covering number.
- Asia > China > Anhui Province > Hefei (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (2 more...)
Efficient Offline Communication Policies for Factored Multiagent POMDPs
Messias, João V., Spaan, Matthijs, Lima, Pedro U.
Factored Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) form a powerful framework for multiagent planning under uncertainty, but optimal solutions require a rigid history-based policy representation. In this paper we allow inter-agent communication which turns the problem in a centralized Multiagent POMDP (MPOMDP). We map belief distributions over state factors to an agent's local actions by exploiting structure in the joint MPOMDP policy. The key point is that when sparse dependencies between the agents' decisions exist, often the belief over its local state factors is sufficient for an agent to unequivocally identify the optimal action, and communication can be avoided. We formalize these notions by casting the problem into convex optimization form, and present experimental results illustrating the savings in communication that we can obtain.
- Europe > Portugal > Lisbon > Lisbon (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)